Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Mar 26, 2023

We have occasional cases where admins attempt a rollback, despite long-standing docs:

Only upgrading to a newer version is supported. Reverting or rolling back your cluster to a previous version is not supported. If your update fails, contact Red Hat support.

Deeper history for that content here, here, and here. With this commit, we'll refuse to accept rollbacks unless the administrator sets Force to waive our guards.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 26, 2023
@wking wking force-pushed the block-rollbacks branch 2 times, most recently from 510a12a to 2e45a4a Compare March 26, 2023 04:42
@wking
Copy link
Member Author

wking commented Mar 26, 2023

CI looks good to me:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/918/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-ovn-upgrade-out-of-change/1639850275931951104/artifacts/e2e-agnostic-ovn-upgrade-out-of-change/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.history[0].acceptedRisks' | grep -v 'no more signatures to check'
Target release version="" image="registry.build04.ci.openshift.org/ci-op-5s8i2m1z/release@sha256:cf2dea703d22d505ff4c8501864f2373d0330ee75f80da360b7cf14f61462a8b" cannot be verified, but continuing anyway because the update was forced: unable to verify sha256:cf2dea703d22d505ff4c8501864f2373d0330ee75f80da360b7cf14f61462a8b against keyrings: verifier-public-key-redhat
Forced through blocking failures: Multiple precondition checks failed:
* Precondition "ClusterVersionRollback" failed because of "LowDesiredVersion": 4.14.0-0.ci.test-2023-03-26-044249-ci-op-5s8i2m1z-initial is less than the current target 4.14.0-0.ci.test-2023-03-26-044847-ci-op-5s8i2m1z-latest, but rollbacks and downgrades are not recommended
* Precondition "ClusterVersionRecommendedUpdate" failed because of "NoChannel": Configured channel is unset, so the recommended status of updating from 4.14.0-0.ci.test-2023-03-26-044847-ci-op-5s8i2m1z-latest to 4.14.0-0.ci.test-2023-03-26-044249-ci-op-5s8i2m1z-initial is unknown.

@wking wking changed the title pkg/payload/precondition/clusterversion/rollback: New precondition OTA-941: pkg/payload/precondition/clusterversion/rollback: New precondition Mar 29, 2023
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 29, 2023
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 29, 2023

@wking: This pull request references OTA-941 which is a valid jira issue.

Details

In response to this:

We have occasional cases where admins attempt a rollback, despite long-standing docs:

Only upgrading to a newer version is supported. Reverting or rolling back your cluster to a previous version is not supported. If your update fails, contact Red Hat support.

Deeper history for that content here, here, and here. With this commit, we'll refuse to accept rollbacks unless the administrator sets Force to waive our guards.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wking added a commit to wking/ci-tools that referenced this pull request Mar 30, 2023
…e annotation

bee450b (ci-operator: expose ephemeral cluster versions based on
parents, 2020-08-11, openshift#1098) taught assembleReleaseStep to consume the
release.openshift.io/config annotation on an ImageStream to find a
version prefix that actually reflects the 4.y release branch (instead
of using 0.0.1-0 as the prefix).  Those annotations are set on the
app.ci ImageStreams, for example:

  $ oc whoami -c
  default/api-ci-l2s4-p1-openshiftapps-com:6443/wking
  $ oc -n ocp get -o json imagestream 4.13 | jq -r '.metadata.annotations["release.openshift.io/config"] | fromjson | .name'
  4.13.0-0.ci

But they are not set in ImageStreams contained within release images:

  $ oc adm release info -o json registry.ci.openshift.org/ocp/release:4.13.0-0.ci-2023-03-29-224346 | jq '.references | {kind, metadata}'
  {
    "kind": "ImageStream",
    "metadata": {
      "name": "4.13.0-0.ci-2023-03-29-224346",
      "creationTimestamp": "2023-03-29T22:52:54Z",
      "annotations": {
        "release.openshift.io/from-image-stream": "ocp/4.13-2023-03-29-224346"
      }
    }
  }

With this commit, I'm taking the name out of the imported release
ImageStream and trying to parse it as a Semantic Version.  If it
parses, I'm constructing a release.openshift.io/config annotation to
set just the 'name' property to the MAJOR.MINOR.SPEC from that parsed
name.  This should allow cluster-bot runs like:

  launch 4.13.0-0.ci,openshift/cluster-version-operator#918,openshift/hypershift#2318 hypershift-hosted

to build releases named 4.13.0-0-... instead of their current
0.0.1-0-...:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-hypershift-hosted/1641294936391290880/artifacts/release/artifacts/release-payload-latest/image-references | jq -r .metadata.name
  0.0.1-0.test-2023-03-30-043404-ci-ln-h9dwcbk-latest

which is failing to run with [1]:

  Release image is not valid: {
    "lastTransitionTime": "2023-03-30T04:36:29Z",
    "message": "releases before 4.8 are not supported",
    "observedGeneration": 3,
    "reason": "InvalidImage",
    "status": "False",
    "type": "ValidReleaseImage"
  }

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-hypershift-hosted/1641294936391290880#1:build-log.txt%3A203-210
wking added a commit to wking/ci-tools that referenced this pull request Mar 30, 2023
…e annotation

bee450b (ci-operator: expose ephemeral cluster versions based on
parents, 2020-08-11, openshift#1098) taught assembleReleaseStep to consume the
release.openshift.io/config annotation on an ImageStream to find a
version prefix that actually reflects the 4.y release branch (instead
of using 0.0.1-0 as the prefix).  Those annotations are set on the
app.ci ImageStreams, for example:

  $ oc whoami -c
  default/api-ci-l2s4-p1-openshiftapps-com:6443/wking
  $ oc -n ocp get -o json imagestream 4.13 | jq -r '.metadata.annotations["release.openshift.io/config"] | fromjson | .name'
  4.13.0-0.ci

But they are not set in ImageStreams contained within release images:

  $ oc adm release info -o json registry.ci.openshift.org/ocp/release:4.13.0-0.ci-2023-03-29-224346 | jq '.references | {kind, metadata}'
  {
    "kind": "ImageStream",
    "metadata": {
      "name": "4.13.0-0.ci-2023-03-29-224346",
      "creationTimestamp": "2023-03-29T22:52:54Z",
      "annotations": {
        "release.openshift.io/from-image-stream": "ocp/4.13-2023-03-29-224346"
      }
    }
  }

With this commit, I'm taking the name out of the imported release
ImageStream and trying to parse it as a Semantic Version.  If it
parses, I'm constructing a release.openshift.io/config annotation to
set just the 'name' property to the MAJOR.MINOR.PATCH from that parsed
name.  This should allow cluster-bot runs like:

  launch 4.13.0-0.ci,openshift/cluster-version-operator#918,openshift/hypershift#2318 hypershift-hosted

to build releases named 4.13.0-0-... instead of their current
0.0.1-0-...:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-hypershift-hosted/1641294936391290880/artifacts/release/artifacts/release-payload-latest/image-references | jq -r .metadata.name
  0.0.1-0.test-2023-03-30-043404-ci-ln-h9dwcbk-latest

which is failing to run with [1]:

  Release image is not valid: {
    "lastTransitionTime": "2023-03-30T04:36:29Z",
    "message": "releases before 4.8 are not supported",
    "observedGeneration": 3,
    "reason": "InvalidImage",
    "status": "False",
    "type": "ValidReleaseImage"
  }

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-hypershift-hosted/1641294936391290880#1:build-log.txt%3A203-210
wking added a commit to wking/ci-tools that referenced this pull request Mar 30, 2023
…e annotation

bee450b (ci-operator: expose ephemeral cluster versions based on
parents, 2020-08-11, openshift#1098) taught assembleReleaseStep to consume the
release.openshift.io/config annotation on an ImageStream to find a
version prefix that actually reflects the 4.y release branch (instead
of using 0.0.1-0 as the prefix).  Those annotations are set on the
app.ci ImageStreams, for example:

  $ oc whoami -c
  default/api-ci-l2s4-p1-openshiftapps-com:6443/wking
  $ oc -n ocp get -o json imagestream 4.13 | jq -r '.metadata.annotations["release.openshift.io/config"] | fromjson | .name'
  4.13.0-0.ci

But they are not set in ImageStreams contained within release images:

  $ oc adm release info -o json registry.ci.openshift.org/ocp/release:4.13.0-0.ci-2023-03-29-224346 | jq '.references | {kind, metadata}'
  {
    "kind": "ImageStream",
    "metadata": {
      "name": "4.13.0-0.ci-2023-03-29-224346",
      "creationTimestamp": "2023-03-29T22:52:54Z",
      "annotations": {
        "release.openshift.io/from-image-stream": "ocp/4.13-2023-03-29-224346"
      }
    }
  }

With this commit, I'm taking the name out of the imported release
ImageStream and trying to parse it as a Semantic Version.  If it
parses, I'm constructing a release.openshift.io/config annotation to
set just the 'name' property to the MAJOR.MINOR.PATCH from that parsed
name.  This should allow cluster-bot runs like:

  launch 4.13.0-0.ci,openshift/cluster-version-operator#918,openshift/hypershift#2318 hypershift-hosted

to build releases named 4.13.0-0-... instead of their current
0.0.1-0-...:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-hypershift-hosted/1641294936391290880/artifacts/release/artifacts/release-payload-latest/image-references | jq -r .metadata.name
  0.0.1-0.test-2023-03-30-043404-ci-ln-h9dwcbk-latest

which is failing to run with [1]:

  Release image is not valid: {
    "lastTransitionTime": "2023-03-30T04:36:29Z",
    "message": "releases before 4.8 are not supported",
    "observedGeneration": 3,
    "reason": "InvalidImage",
    "status": "False",
    "type": "ValidReleaseImage"
  }

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-hypershift-hosted/1641294936391290880#1:build-log.txt%3A203-210
@wking wking force-pushed the block-rollbacks branch from 2e45a4a to e0f733c Compare March 31, 2023 00:23
We have occasional cases where admins attempt a rollback, despite
long-standing docs [1]:

  Only upgrading to a newer version is supported. Reverting or rolling
  back your cluster to a previous version is not supported. If your
  update fails, contact Red Hat support.

Deeper history for that content includes [2,3,4].  With this commit,
we'll refuse to accept rollbacks unless the administrator sets 'Force'
to waive our guards.

[1]: https://docs.openshift.com/container-platform/4.12/updating/understanding-openshift-updates.html
[2]: https://github.com/openshift/openshift-docs/blame/60ae4bf339756189b7e72491b79d53764cbe85fa/modules/update-service-overview.adoc#L33
[3]: openshift/openshift-docs@10515fe
[4]: openshift/openshift-docs@7c882ea#diff-769155bafe6ff5307aa01d79f6b07ad34bc72fd8431fef3501a554802d653ee5R52-R53
@wking wking force-pushed the block-rollbacks branch from e0f733c to 9ac845f Compare March 31, 2023 00:32
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 31, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 31, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: petr-muller, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD dfe5ef5 and 2 for PR HEAD 9ac845f in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 31, 2023

@wking: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 7f63e64 into openshift:master Mar 31, 2023
@wking wking deleted the block-rollbacks branch March 31, 2023 16:40
wking added a commit to wking/cluster-version-operator that referenced this pull request Sep 11, 2023
…ious, not rollback SemVer

In 9ac845f (pkg/payload/precondition/clusterversion/rollback: New
precondition, 2023-03-25, openshift#918), we grew a new precondition that
parsed the current and target releases as semantic versions and
rejected rollbacks.  Admins could use 'force: true' to push through
those (and other) guards, e.g. for testing purposes.

However, we still lacked guards around SemVer increases, like 4.14.z
hopping straight to 4.16.  In most cases, customers will be using an
OpenShift Update Service with a channel, and getting recommended
updates they can use with 'oc adm upgrade --to ...' or the in-cluster
web-console.  But some clusters are not using update services.

* ARO clusters are not subscribed to a channel by default and need to
  opt in [1].
* Some disconnected/restricted-network clusters currently use
  --to-image updates, although I personally think they would be safer
  running a local update services [2].

With this commit, I'm stiffening the previous guard to consume the
previous-version metadata that's baked into release images [3] and
consumed by Cincinnati when creating the update service responses [4].
This isn't as helpful as using an actual update service, because it
will not include information about update risks we discover after
building the release [5].  But it's still strictly stronger than the
outgoing rollback-specific guard, and we haven't had to extend that
previous-release list since the 4.1 release candidates [6].

[1]: https://learn.microsoft.com/en-us/azure/openshift/howto-upgrade#check-for-azure-red-hat-openshift-cluster-upgrades
[2]: https://issues.redhat.com/browse/OTA-821
[3]: https://github.com/openshift/oc/blob/795bf1a6260847ecfc612da2ab11ea2d6e07da16/pkg/cli/admin/release/new.go#L135
[4]: https://github.com/openshift/cincinnati/blob/d77203d472ed5a7e00112c4d8265ba20f5034824/cincinnati/src/plugins/internal/graph_builder/release_scrape_dockerv2/registry/mod.rs#L419
[5]: https://github.com/openshift/cincinnati-graph-data#block-edges
[6]: https://github.com/openshift/cincinnati-graph-data/blob/c09842556a9c5d3920f9f2d004e24b6fb2f3a2de/raw/metadata.json
wking added a commit to wking/cluster-version-operator that referenced this pull request Sep 15, 2023
…nor-version updates

The previous logic blocked both minor-version increases and
minor-version decreases.  The new logic allows minor-version decreases
and patch updates (where the minor version doesn't change), unless
overrides are in effect.

Minor version decreases will still be blocked by 9ac845f
(pkg/payload/precondition/clusterversion/rollback: New precondition,
2023-03-25, openshift#918), which comes with a more appropriate message.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants